Modular resource development and diagnostic evaluation framework for fast NLP system improvement

نویسندگان

  • Gaël de Chalendar
  • Damien Nouvel
چکیده

Natural Language Processing systems are large-scale softwares, whose development involves many man-years of work, in terms of both coding and resource development. Given a dictionary of 110k lemmas, a few hundred syntactic analysis rules, 20k ngrams matrices and other resources, what will be the impact on a syntactic analyzer of adding a new possible category to a given verb? What will be the consequences of a new syntactic rules addition? Any modification may imply, besides what was expected, unforeseeable side-effects and the complexity of the system makes it difficult to guess the overall impact of even small changes. We present here a framework designed to effectively and iteratively improve the accuracy of our linguistic analyzer LIMA by iterative refinements of its linguistic resources. These improvements are continuously assessed by evaluating the analyzer performance against a reference corpus. Our first results show that this framework is really helpful towards this goal.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Proposed Model for Evaluating Modular Education

Introduction: Educational evaluation is one of the main elements of educational systems. It also has a particular role in different educational standards such as ISO 10015. It seems that oral and written examinations are not enough for an effective evaluation of instructions. In order to accomplish an efficient educational evaluation, an evaluation model was designed for educational assessment ...

متن کامل

Development a Model for the Evaluation and Improvement of Key Human Resource Competencies Using the Grounded Theory

INTRODUCTION: The Red Crescent Society of the Islamic Republic of Iran is a human-centered organization. Therefore, the competencies of its employees must be improved to make it possible for them to have their best performances. In this regard, the present study aimed to investigate and identify the factors that affect the development of key human resource competencies in the Iranian Red Cresce...

متن کامل

Experimental Fast-Tracking of Morphological Analysers for Nguni Languages

The development of natural language processing (NLP) components is resource-intensive and therefore justifies exploring ways of reducing development time and effort when building NLP components. This paper addresses the experimental fast-tracking of the development of finite-state morphological analysers for Xhosa, Swati and (Southern) Ndebele by using an existing prototype of a morphological a...

متن کامل

TectoMT: Modular NLP Framework

In the present paper we describe TectoMT, a multi-purpose open-source NLP framework. It allows for fast and efficient development of NLP applications by exploiting a wide range of software modules already integrated in TectoMT, such as tools for sentence segmentation, tokenization, morphological analysis, POS tagging, shallow and deep syntax parsing, named entity recognition, anaphora resolutio...

متن کامل

Assessment of Improvement of Preventive Maintenance Systems Related to the Civil Projects Using Concepts of Value Engineering (RESEARCH NOTE)

The purpose of this paper is using the concepts of value engineering (VE) in evaluating the improvement caused by preventive maintenance (PM) systems in construction project. A real case is used to show how we can implement the proposed method. VE is the systematic application of recognized techniques by multi-disciplined teams that identifies the function of a product or service, establishes a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009